-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Fix issues in tableRowFromMessage #36425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @reuvenlax, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
e5e58c1 to
214e73d
Compare
Abacn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
@Abacn Upon adding a unit test, I realized that there are some more fundamental problems with this code (pre existing) - it corrupts certain types. Trying to fix this now. |
|
Assigning reviewers: R: @m-trieu for label java. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Fixes #33531 |
214e73d to
db2020b
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring to improve how BigQueryIO handles Protobuf messages, specifically to support field names like "f" which conflict with TableRow's internal API. The changes are extensive, introducing SerializableBiFunction to pass schema information, and refactoring TableRowToStorageApiProto to use a more robust setF-based approach for TableRow creation. The introduction of SerializableBiFunctions and new tests for various proto encodings are great additions. My review includes a few minor suggestions for code style and clarity in the new helper classes, and a question about a change in test coverage. Overall, this is a high-quality contribution that improves the robustness of BigQueryIO.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant fixes to tableRowFromMessage to handle edge cases and improve data conversion correctness. Specifically, it addresses issues with BigQuery columns named "f" by introducing a fallback to using TableRow.setF, and it correctly decodes special BigQuery types (like DATE, DATETIME) from protocol buffers by leveraging schema information. The changes are extensive, refactoring many parts of the BigQueryIO write path to plumb through the necessary schema information, and introducing a SerializableBiFunction to formatting functions. The addition of SerializableBiFunctions provides useful utilities. The code changes look solid and are well-tested with new unit and integration tests. I have a couple of suggestions for improving exception messages and cleaning up test code.
...le-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java
Outdated
Show resolved
Hide resolved
...le-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOWriteTest.java
Outdated
Show resolved
Hide resolved
| throw new RuntimeException( | ||
| "Not implemented yet " | ||
| + fieldDescriptor.getMessageType().getName() | ||
| + " PARTIAL NAME " | ||
| + fieldDescriptor.getMessageType().getName() | ||
| + " FIELD NAME " | ||
| + prefix | ||
| + " CLASS TYPE " | ||
| + fieldValue.getClass()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception message here seems to contain debugging information ("PARTIAL NAME", "FIELD NAME", "CLASS TYPE"). It would be better to provide a cleaner, more user-friendly message. Also, consider using UnsupportedOperationException instead of a generic RuntimeException for cases that are not implemented. This applies to similar throw new RuntimeException("Not implemented...") statements in this method for other types like BOOL, INT64, BYTES, and TIMESTAMP.
throw new UnsupportedOperationException(
"Converting BigQuery 'DOUBLE' from a protobuf message of type '"
+ fieldDescriptor.getMessageType().getName()
+ "' is not supported. Field: "
+ prefix);
...-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProtoIT.java
Outdated
Show resolved
Hide resolved
|
Thanks, understand there is a deeper issue in cc: @liferoad WDYT? |
|
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withFormatRecordOnFailureFunction(org.apache.beam.sdk.transforms.SerializableFunction) is still a workaround for this. So I think fixing with Beam 2.69.0 is not urgent. |
|
Agree. We will get this in after the cut. |
a7bc70c to
4141b42
Compare
|
@Abacn all tests now passing |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant fixes and refactorings to tableRowFromMessage to address fundamental issues with data type conversions and field name conflicts. The changes correctly handle BigQuery columns named "f" by using setF and ensure proper decoding of special types like DATE and DATETIME by leveraging schema information during conversion from protocol buffers. The introduction of SerializableBiFunction and its helpers is a good way to evolve the API while maintaining backward compatibility. The code is also made more maintainable by centralizing conversion logic. I've found a couple of potential issues related to locale-sensitivity in number formatting and incorrect timestamp formatting that could lead to bugs. Overall, this is a solid improvement.
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
Outdated
Show resolved
Hide resolved
...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java
Show resolved
Hide resolved
Abacn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, understanding the need of plumbing schemainformation led to refactorings. Had one comments hopefully find useful. Have done review with schemainformation/getF fixes, still going through date time changes
...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
Outdated
Show resolved
Hide resolved
8f98821 to
3e65c16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
(just run /gemini review again to check typos - to my experience the tool is good at finding this kind of issue)
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant and well-executed fixes to tableRowFromMessage, addressing fundamental issues with type decoding and handling of special field names. The introduction of SchemaInformation to guide the conversion from protocol buffers to TableRow is a crucial improvement for correctly handling BigQuery-specific types like DATE and DATETIME. The clever fallback mechanism to handle table columns named "f" by using setFlipstick is a great way to ensure backward compatibility while fixing the underlying issue. The refactoring of type conversion logic into a map of converters in TableRowToStorageApiProto greatly improves the code's structure and maintainability. The extensive addition of tests covering these fixes is commendable. I have one suggestion regarding a DateTimeFormatter implementation that could be improved.
This PR fixes multiple fundamental issues in tableRowFromMessage. This function is usually used in the dead-letter output from BigQueryIO, though it has other uses as well.
In addition to fixing these bugs, many tests were added.